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Abstract 

Research into the automatic acquisition of lex- 
ical information from corpora is starting to 
produce large-scale computational lexicons con- 
taining data on the relative frequencies of sub- 
categorisation alternatives for individual verbal 
predicates. However, the empirical question of 
whether this type of frequency information can 
in practice improve the accuracy of a statistical 
parser has not yet been answered. In this paper 
we describe an experiment with a wide-coverage 
statistical grammar and parser for English and 
subcategorisation frequencies acquired from ten 
million words of text which shows that this in- 
formation can significantly improve parse accu- 
racyQ. 

1 Introduction 

Recent work on the automatic acquisition of 
lexical information from substantial amounts of 
machine-readable text (e.g. Briscoe &: Carroll, 
1997; Gahl, 1998; CarroU k Rooth, 1998) has 
opened up the possibility of producing large- 
scale computational lexicons containing data 
on the relative frequencies of subcategorisa- 
tion alternatives for individual verbal predi- 
cates. However, although Resnik (1992), Sch- 
abes (1992), Carroh & Weir (1997) and others 
have proposed 'lexicalised' probabilistic gram- 
mars to improve the accuracy of parse rank- 
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ing, no wide-coverage parser has yet been con- 
structed which explicitly incorporates probabil- 
ities of different subcategorisation alternatives 
for individual predicates. It is therefore an open 
question whether this type of information can 
actually improve parser accuracy in practice. 

In this paper we address this issue, describing 
an experiment with an existing wide-coverage 
statistical grammar and parser for English (Car- 
roll &: Briscoe, 1996) in conjunction with sub- 
categorisation frequencies acquired from 10 mil- 
lion words of text from the British National 
Corpus (BNC; Leech, 1992). Our results show 
conclusively that this information can improve 
parse accuracy. 

2 Background 

2.1 Subcategorisation Acquisition 

Several substantial machine-readable subcate- 
gorisation dictionaries exist for English, either 
built semi-automatically from machine-readable 
versions of conventional learners' dictionaries, 
or manually by (computational) linguists (e.g. 
the Alvey NL Tools (ANLT) dictionary, Bogu- 
raev et al. (1987); the COMLEX Syntax dic- 
tionary, Grishman, Macleod & Meyers (1994)). 
However, since these efforts were not carried out 
in tandem with rigorous large-scale classifica- 
tion of corpus data, none of the resources pro- 
duced provide useful information on the relative 
frequency of different subcategorisation frames. 

Systems which are able to acquire a small 
number of verbal subcategorisation classes au- 
tomatically from corpus text have been de- 
scribed by Brent (1991, 1993), and Ushioda 
et al. (1993). Ushioda et al. also derive rel- 
ative subcategorisation frequency information 
for individual predicates. In this work they 
utilise a part-of-speech (PoS) tagged corpus and 
finite-state NP parser to recognise and calculate 



the relative frequency of six subcategorisation 
classes. They report that for 32 out of 33 verbs 
tested their system correctly predicts the most 
frequent class, and for 30 verbs it correctly pre- 
dicts the second most frequent class, if there was 
one. 

Manning (1993) reports a larger experiment, 
also using a PoS tagged corpus and a finite-state 
NP parser, attempting to recognise sixteen dis- 
tinct complementation patterns — although not 
with relative frequencies. In a comparison be- 
tween entries for 40 common verbs acquired 
from 4.1 million words of text and the entries 
given in the Oxford Advanced Learner's Dictio- 
nary of Current English (Hornby, 1989) Man- 
ning's system achieves a precision of 90% and a 
recall of 43%. 

Gahl (1998) presents an extraction tool for 
use with the BNC that is able to create sub- 
corpora containing different subcategorisation 
frames for verbs, nouns and adjectives, given 
the frames expected for each predicate. The 
tool is based on a set of regular expressions 
over PoS tags, lemmas, morphosyntactic tags 
and sentence boundaries, effectively performing 
the same function chunking parser (c.f. Ab- 
ney, 1996). The resulting subcorpora can be 
used to determine the (relative) frequencies of 
the frames. 

Carroll & Rooth (1998) use an iterative ap- 
proach to estimate the distribution of subcat- 
egorisation frames given head words, starting 
from a manually-developed context-free gram- 
mar (of English). First, a probabilistic ver- 
sion of the grammar is trained from a text cor- 
pus using the expectation-maximisation (EM) 
algorithm, and the grammar is lexicalised on 
rule heads. The EM algorithm is then run 
again to calculate the expected frequencies of a 
head word accompanied by a particular frame. 
These probabilities can then be fed back into 
the grammar for the next iteration. Carroll & 
Rooth report encouraging results for three verbs 
based on applying the technique to text from 
the BNC. 

Briscoe &: Carroll (1997) describe a system 
capable of distinguishing 160 verbal subcate- 
gorisation classes — a superset of those found in 
the ANLT and COMLEX Syntax dictionaries — 
returning relative frequencies for each frame 
found for each verb. The classes also incorpo- 
rate information about control of predicative ar- 
guments and alternations such as particle move- 



ment and extraposition. The approach uses a 
robust statistical parser which yields complete 
though 'shallow' parses, a comprehensive sub- 
categorisation class classifier, and a priori esti- 
mates of the probability of membership of these 
classes. For a sample of seven verbs with multi- 
ple subcategorisation possibilities the system's 
frequency rankings averaged 81% correct. (We 
talk about this system further in section 3.2 be- 
low, describing how we used it to provide fre- 
quency data for our experiment). 

2.2 Lexicalised Statistical Parsing 

Carroll & Weir (1997) — without actually build- 
ing a parsing system — address the issue of how 
frequency information can be associated with 
lexicalised grammar formalisms, using Lexical- 
ized Tree Adjoining Grammar (Joshi & Schabes, 
1991) as a unifying framework. They consider 
systematically a number of alternative proba- 
bilistic formulations, including those of Resnik 
(1992) and Schabes (1992) and implemented 
systems based on other underlying grammati- 
cal frameworks, evaluating their adequacy from 
both a theoretical and empirical perspective in 
terms of their ability to model particular distri- 
butions of data that occur in existing treebanks. 

Magerman (1995), Colhns (1996), Ratna- 
parkhi (1997), Charniak (1997) and others de- 
scribe implemented systems with impressive ac- 
curacy on parsing unseen data from the Penn 
Treebank (Marcus, Santorini Sz Marcinkiewicz, 
1993). These parsers model probabilistically 
the strengths of association between heads of 
phrases, and the configurations in which these 
lexical associations occur. The accuracies re- 
ported for these systems are substantially bet- 
ter than their (non-lexicalised) probabilistic 
context-free grammar analogues, demonstrat- 
ing clearly the value of lexico-statistical infor- 
mation. However, since the grammatical de- 
scriptions are induced from atomic-labeled con- 
stituent structures in the training treebank, 
rather than coming from an explicit generative 
grammar, these systems do not make contact 
with traditional notions of argument structure 
(i.e. subcategorisation, selectional preferences of 
predicates for complements) in any direct sense. 
So although it is now possible to extract at least 
subcategorisation data from large corpora]^ with 



^Grishman & Sterling (1992), Poznanski & Sanfilippo 
(1993), Resnik (1993), Ribas (1994), McCarthy (1997) 
and others have shown that it is possible also to ac- 



some degree of reliability, it would be difficult 
to integrate the data into this type of parsing 
system. 

Briscoe & Carroll (1997) present a small-scale 
experiment in which subcategorisation class fre- 
quency information for individual verbs was in- 
tegrated into a robust statistical (non-lexicalis- 
ed) parser. The experiment used a test corpus 
of 250 sentences, and used the standard GEIG 
bracket precision, recall and crossing measures 
(Grishman, Macleod & Sterling, 1992) for eval- 
uation. While bracket precision and recall were 
virtually unchanged, the crossing bracket score 
for the lexicalised parser showed a 7% improve- 
ment. However, this difference turned out not 
to be statistically significant at the 95% level: 
some analyses got better while others got worse. 

We have performed a similar, but much larger 
scale experiment, which we describe below. We 
used a larger test corpus, acquired data from 
an acquisition corpus an order of magnitude 
larger, and used a different quantitative evalua- 
tion measure that we argue is more sensitive to 
argument /adjunct and attachment distinctions. 
We summarise the main features of the 'base- 
line' parsing system in section |3.1| , describe how 
we lexicalised it (section |3.2| ) , present t he r esults 
of the quantitative evaluation (section |3.3D , give 
a qualitative analysis of the analysis errors made 
(section ^3), and conclude with directions for 
future work. 

3 The Experiment 
3.1 The Basehne Parser 

The baseline parsing system comprises: 

• an HMM part-of-speech tagger (Elworthy, 
1994), which produces either the single 
highest-ranked tag for each word, or multi- 
ple tags with associated forward-backward 
probabilities (which are used with a thresh- 
old to prune lexical ambiguity); 

• a robust finite-state lemmatiser for En- 
glish, an extended and enhanced version 
of the University of Sheffield GATE sys- 
tem morphological analyser (Cunningham 
et ai, 1995); 

• a wide-coverage unification-based 'phrasal' 
grammar of English PoS tags and punctu- 
ation; 

quire selection preferences automatically from (partially) 
parsed data. 



• a fast generalised LR parser using this 
grammar, taking the results of the tagger as 
input, and performing disambiguation us- 
ing a probabilistic model similar to that of 
Briscoe & Carroll (1993); and 

• training and test treebanks (of 4600 and 
500 sentences respectively) derived semi- 
automatically from the SUSANNE corpus 
(Sampson, 1995); 

The grammar consists of 455 phrase struc- 
ture rule schemata in the format accepted by 
the parser (a syntactic variant of a Definite 
Clause Grammar with iterative (Kleene) op- 
erators). It is 'shallow' in that no attempt 
is made to fully analyse unbounded dependen- 
cies. However, the distinction between argu- 
ments and adjuncts is expressed, following X- 
bar theory, by Chomsky-adjunction to maximal 
projections of adjuncts {XP XP Adjunct) 
as opposed to 'government' of arguments (i.e. 
arguments are sisters within XI projections; 
XI XO Argl ... ArgN). Furthermore, all 
analyses are rooted (in S) so the grammar as- 
signs global, shallow and often 'spurious' analy- 
ses to many sentences. Currently, the coverage 
of this grammar — the proportion of sentences 
for which at least one analysis is found — is 79% 
when applied to the SUSANNE corpus, a 138K 
word treebanked and balanced subset of the 
Brown corpus. 

Inui et al. (1997) have recently proposed a 
novel model for probabilistic LR parsing which 
they justify as theoretically more consistent and 
principled than the Briscoe & Carroll (1993) 
model. We use this new model since we have 
found that it indeed also improves disambigua- 
tion accuracy. 

The 500-sentence test corpus consists only of 
in-coverage sentences, and contains a mix of 
written genres: news reportage (general and 
sports), helles lettres, biography, memoirs, and 
scientific writing. The mean sentence length is 
19.3 words (including punctuation tokens). 

3.2 Incorporating Acquired 

Subcategorisation Information 

The test corpus contains a total of 485 distinct 
verb lemmas. We ran the Briscoe & Carroll 
(1997) subcategorisation acquisition system on 
the first 10 million words of the BNC, for each of 
these verbs saving the first 1000 cases in which 
a possible instance of a subcategorisation frame 



AP 


MP pp pp 


pp UHPP 


VP IMF 

V r X IM r 


l\Tm\TT 
iMUiMHj 


1\TP QPriMP 


PP IJTJC: 

r r _wno 


irpTMP 
V r 1 IM 


NP 


NP_WHPP 


PP_WHVP 


VPING_PP 


NP_AP 


PP 


SCOMP 


VPPRT 


NPJJP 


PP_AP 


SINF 


WHPP 


NPJJP_SCDMP 


PP_PP 


SING 




NP_PP 


PP_SCOMP 


SING_PP 




NP_PPOF 


PP_VPINF 


VPBSE 





Table 1: VSUBCAT values in the grammar. 

was identified. For each verb the acquisition 
system hypothesised a set of lexical entries cor- 
responding to frames for which it found enough 
evidence. Over the complete set of verbs we 
ended up with a total of 5228 entries, each with 
an associated frequency normalised with respect 
to the total number of frames for all hypothe- 
sised entries for the particular verb. 

In the experiment each acquired lexical en- 
try was assigned a probability based on its nor- 
malised frequency, with smoothing — to allow for 
unseen events — using the (comparatively crude) 
add-1 technique. We did not use the lexical en- 
tries themselves during parsing, since missing 
entries would have compromised coverage. In- 
stead, we factored in their probabilities during 
parse ranking at the end of the parsing process. 

We ranked complete derivations based on the 
product of (1) the (purely structural) deriva- 
tion probability according to the probabilistic 
LR model, and (2) for each verb instance in 
the derivation the probability of the verbal lex- 
ical entry that would be used in the particu- 
lar analysis context. The entry was located via 
the VSUBCATvalue assigned to the verb in the 
analysis by the immediately dominating verbal 
phrase structure rule in the grammar: VSUB- 
CAT values are also present in the lexical entries 
since they were acquired using the same gram- 
mar. Table I lists the VSUBCAT values. The 
values are mostly self-explanatory; however, ex- 
amples of some of the less obvious ones are given 
in (1). 

(1) They made (NP_WHPP) a great fuss about 
what to do. 

They admitted (PP_SCOMP) to the authori- 
ties that they had entered illegally. 
It dawned (PP_WHS) on him what he should 
do. 

Some VSUBCATvalues correspond to several of 



the 160 subcategorisation classes distinguished 
by the acquisition system. In these cases the 
sum of the probabilities of the corresponding 
entries was used. The finer distinctions stem 
from the use by the acquisition system of ad- 
ditional information about classes of specific 
prepositions, particles and other function words 
appearing within verbal frames. In this experi- 
ment we ignored these distinctions. 

In taking the product of the derivation and 
subcategorisation probabilities we have lost 
some of the properties of a statistical language 
model. The product is no longer strictly a prob- 
ability, although we do not attempt to use it 
as such: we use it merely to rank competing 
analyses. Better integration of these two sets of 
probabilities is an area which requires further 
investigation. 

3.3 Quantitative Evaluation 
3.3.1 Bracketing 

We evaluated parser accuracy on the unseen 
test corpus with respect to the phrasal brack- 
eting annotation standard described by Carroll 
et al. (1997) rather than the original SUSANNE 
bracketings, since the analyses assigned by the 
grammar and by the corpus differ for many con- 
structions^ However, with the exception of SU- 
SANNE 'verb groups' our annotation standard 
is bracket-consistent with the treebank analy- 
ses (i.e. no 'crossing brackets'). Table || shows 
the baseline accuracy of the parser with respect 
to (unlabelled) bracketings, and also with this 
model when augmented with the extracted sub- 
categorisation information. Briefiy, the evalu- 
ation metrics compare unlabelled bracketings 
derived from the test treebank with those de- 
rived from parses, computing recall, the ratio 
of matched brackets over all brackets in the 
treebank; precision, the ratio of matched brack- 
ets over all brackets found by the parser; mean 
crossings, the number of times a bracketed se- 
quence output by the parser overlaps with one 
from the treebank but neither is properly con- 
tained in the other, averaged over all sentences; 
and zero crossings, the percentage of sentences 

■^Our previous attempts to produce SUSANNE annota- 
tion scheme analyses were not entirely successful, since 
SUSANNE does not have an underlying grammar, or even 
a formal description of the possible bracketing configu- 
rations. Our evaluation results were often more sensitive 
to the exact mapping we used than to changes we made 
to the parsing system itself. 
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Table 2: Bracketing evaluation measures, before and after incorporation of subcat information 



for which the analysis returned has zero cross- 
ings (see Grishman, Macleod & Sterling, 1992). 

Since the test corpus contains only in- 
coverage sentences our results are relative to the 
80% or so of sentences that can be parsed. In 
experiments measuring the coverage of our sys- 
tem (Carroll & Briscoe, 1996), we found that 
the mean length of failing sentences was lit- 
tle different to that of successfully parsed ones. 
We would therefore argue that the remaining 
20% of sentences are not significantly more com- 
plex, and therefore our results are not skewed 
due to parse failures. Indeed, in these experi- 
ments a fair proportion of unsuccessfully parsed 
sentences were elliptical noun or prepositional 
phrases, fragments from dialogue and so forth, 
which we do not attempt to cover. 

On these measures, there is no significant dif- 
ference between the baseline and lexicalised ver- 
sions of the parser. In particular, the mean 
crossing rates per sentence are almost identical. 
This is in spite of the fact that the two versions 
return different highest-ranked analyses for 30% 
of the sentences in the test corpus. The reason 
for the similarity in scores appears to be that the 
annotation scheme and evaluation measures are 
relatively insensitive to argument /adjunct and 
attachment distinctions. For example, in the 
sentence (2) from the test corpus 

(2) Salem ( AP ) - the statewide meeting of 
war mothers Tuesday in Salem will hear a 
greeting from Gov. Mark Hatfield. 

the phrasal analyses returned by the baseline 
and lexicalised parsers are, respectively (3a) and 
(3b). 

(3) a ... (VP will hear (NP a greeting) (PP 

from (NP Gov. Mark Hatfield))) ... 

b ... (VP will hear (NP a greeting (PP 
from (NP Gov. Mark Hatfield)))) ... 

The latter is correct, but the former, incor- 
rectly taking the PP to be an argument of the 



verb, is penalised only lightly by the evalua- 
tion measures: it has zero crossings, and 75% 
recall and precision. This type of annotation 
and evaluation scheme may be appropriate for 
a phrasal parser, such as the baseline version of 
the parser, which does not have the knowledge 
to resolve such ambiguities. Unfortunately, it 
masks differences between such a phrasal parser 
and one which can use lexical information to 
make informed decisions between complemen- 
tation and modification possibilities^. 

3.3.2 Grammatical Relation 

We therefore also evaluated the baseline and 
lexicalised parser against the 500 test sentences 
marked up in accordance with a second, gram- 
matical relation-based (GR) annotation scheme 
(described in detail by Carroll, Briscoe & San- 
filippo, 1998). 

In general, grammatical relations (GRs) are 
viewed as specifying the syntactic dependency 
which holds between a head and a dependent. 
The set of GRs form a hierarchy; the ones we are 
concerned with are shown in figure |l|. Subj (ect) 
GRs divide into clausal (xsubj/csubj), and non- 
clausal (ncsubj) relations. Co (lement) GRs 
divide into clausal, and into non-clausal direct 
object (dobj), second (non-clausal) complement 
in ditransitive constructions {obj2), and indi- 
rect object complement introduced by a prepo- 
sition (iobj). In general the parser returns the 
most specific (leaf) relations in the GR hier- 
archy, except when it is unable to determine 
whether clausal subjects/objects are controlled 
from within or without (i.e. csubj vs. xsubj, and 
ccomp vs. xcomp respectively), in which case it 
returns subj or clausal as appropriate. Each re- 
lation is parameterised with a head (lemma) 
and a dependent (lemma) — also optionally a 

''Shortcomings of this combination of annotation and 
evaluation scheme have been noted previously by Lin 
(1996), Carpenter & Manning (1997) and others. Car- 
roll, Briscoe & Sanfilippo (1998) summarise the various 
criticisms that have been made. 
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Figure 1: Portions of GR hierarchy used. (Relations in italics are not returned by the parser). 



type and/or specification of grammatical func- 
tion. For example, the sentence (4a) would be 
marked up as in (4b). 

(4) a Paul intends to leave IBM. 

h ncsuhj (intendjPaul,-) 
xcomp (to,intend,leave) 
ncsubj (leave, Paul,_) 
dohj (leave,IBM,_) 

Carroll, Briscoe & Sanfilippo (1998) justify this 
new evaluation annotation scheme and compare 
it with others (constituent- and dependency- 
based) that have been proposed in the litera- 
ture. 

The relatively large size of the test corpus 
has meant that to date we have in some cases 
not distinguished between c/xsubj and between 
c/xcomp, and we have not marked up modifi- 
cation relations; we thus report evaluation with 
respect to argument relations only (but includ- 
ing the relation arg_mod — a semantic argument 
which is syntactically realised as a modifier, 
such as the passive 'by-phrase'). The mean 
number of GRs per sentence in the test corpus 
is 4.15. 

When computing matches between the GRs 
produced by the parser and those in the corpus 
annotation, we allow a single level of subsump- 
tion: a relation from the parser may be one 
level higher in the GR hierarchy than the ac- 
tual correct relation. For example, if the parser 
returns clausal, this is taken to match both the 
more specific xcomp and ccomp. Also, an un- 
specified filler (_) for the type slot in the iobj 
and clausal relations successfully matches any 
actual specified filler. The head slot fillers are in 
all cases the base forms of single head words, so 



for example, 'multi-component' heads, such as 
the names of people, places or organisations are 
reduced to one word; thus the slot filler corre- 
sponding to Mr. Bill Clinton would be Clinton. 
For real- world applications this might not be the 
desired behaviour — one might instead want the 
token Mr._BilLClinton. This could be achieved 
by invoking a processing phase similar to the 
conventional 'named entity' identification task 
in information extraction. 

Considering the previous example (2), but 
this time with respect to GRs, the sets returned 
by the baseline and lexicalised parsers are (5a) 
and (5b), respectively. 

(5) a ncsubj (hear,meeting,_) 
dobj (hear, greeting, _) 
iobj (from, hear, Hatfield) 

b ncsubj (hear, meeting, _) 
dobj (hear, greeting, _) 

The latter is correct, but the former, incor- 
rectly taking the PP to be an argument of the 
verb, hear, is penalised more heavily than in the 
bracketing annotation and evaluation schemes: 
it gets only 67% recall. There is also no mis- 
leadingly low crossing score since there is no 
analogue to this in the GR scheme. 

Table ^ gives the result of evaluating the base- 
line and lexicalised versions of the parser on the 
GR annotation. The measures compare the set 
of GRs in the annotated test corpus with those 
returned by the parser, in terms of recall, the 
percentage of GRs correctly found by the parser 
out of all those in the treebank; and precision, 
the percentage of GRs returned by the parser 
that are actually correct. In the evaluation, GR 
recall of the lexicalised parser drops by 0.5% 
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after incorporation of subcategorisation infor- 
mation. Argument relations only. 
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Table 5: Numbers of errors of each type made 
by the lexicalised parser. 



compared with the baseline, while precision in- 
creases by 9.0%. The drop in recall is not statis- 
tically significant at the 95% level (paired t-test, 
1.46, 499 df, p > 0.1), whereas the increase in 
precision is significant even at the 99.95% level 
{paired t-test, 5.14, 499 df, p < 0.001). 

Table § gives the number of each type of GR 
returned by the two models, compared with the 
correct numbers in the test corpus. The base- 
line parser returns a mean of 4.65 relations per 
sentence, whereas the lexicalised parser returns 
only 4.15, the same as the test corpus. This 
is further, indirect evidence that the lexicalised 
probabilistic system models the data more ac- 
curately. 

3.4 Discussion 

In addition to the quantitative analysis of parser 
accuracy reported above, we have also per- 
formed a qualitative analysis of the errors made. 
We looked at each of the errors made by the lexi- 
calised version of the parser on the 500-sentence 
test corpus, and categorised them into errors 
concerning: complementation, modification, co- 
ordination, structural attachment of textual ad- 
juncts, and phrase-internal misbracketing. Of 
course, multiple errors within a given sentence 
may interact, in the sense that one error may so 
disrupt the structure of an analysis that it nec- 
essarily leads to one or more other errors being 
made. In all cases, though, we considered all 
of the errors and did not attempt to determine 
whether or not one of them was the 'root cause'. 
Table ^ summarises the number of errors of each 
type over the test corpus. 

Typical examples of the five error types iden- 
tified are: 

complementation ... decried the high rate of 
unemployment in the state misanalysed as 
decry followed by an NP and a PP comple- 
ment; 

modification in ... surveillance of the pricing 



practices of the concessionaires for the pur- 
pose of keeping the prices reasonable, the 
PP modifier for the purpose of ... attached 
'low' to concessionaires rather than 'high' 
to surveillance; 

co-ordination the NP priests, soldiers, and 
other members of the party misanalysed as 
just two conjuncts, with the first conjunct 
containing the first two words in apposi- 
tion; 

textual in But you want a job guaranteed when 
you return, I continued my attack, the (tex- 
tual) adjunct / ... attack attached to the 
VP guaranteed . . . return rather than the S 
But ... return; and 

misbracketing Nowhere in Isfahan is this rich 
aesthetic life of the Persians . . . has of mis- 
analysed as a particle, with the Persians 
becoming a separate NP. 

There are no obvious trends within each type 
of error, although some particularly numerous 
sub-types can be identified. In 8 of the 30 cases 
of textual misanalysis, a sentential textual ad- 
junct preceded by a comma was attached too 
low. The most common type of modification er- 
ror was — in 20 of the 134 cases — misattachment 
of a PP modifier of to a higher VP. The ma- 
jority of the complementation errors were ver- 
bal, accounting for 115 of the total of 124. In 
15 cases of incorrect verbal complementation a 
passive construction was incorrectly analysed as 
active, often with a following 'by' prepositional 
phrase erroneously taken to be a complement. 

Other shortcomings of the system were ev- 
ident in the treatment of co-ordinated verbal 
heads, and of phrasal verbs. The grammatical 
relation extraction module is currently unable 
to return GRs in which the verbal head alone 
appears in the sentence as a conjunct — as in the 
VP ... to challenge and counter-challenge the 
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authentication. This can be remedied fairly eas- 
ily. Phrasal verbs, such as to consist of are iden- 
tified as such by the subcategorisation acquisi- 
tion system. The grammar used by the shal- 
low parser analyses phrasal verbs in two stages: 
firstly the verb itself and the following parti- 
cle are combined to form a sub-constituent, and 
then phrasal complements are attached. The 
simple mapping from l/^S f/5 (7^4 T values to sub- 
categorisation classes cannot cope with the sec- 
ond level of embedding of phrasal verbs, so these 
verbs do not pick up any lexical information at 
parse time. 

4 Conclusions 

We surveyed recent work on automatic acquisi- 
tion from corpora of subcategorisation and as- 
sociated frequency information. We described 
an experiment with a wide-coverage statistical 
grammar and parser for English and subcate- 
gorisation frequencies acquired from 10 million 
words of text which shows that this information 
can significantly improve the accuracy of recov- 
ery of grammatical relation specifications from 
a test corpus of 500 sentences covering a number 
or different genres. 

Future work will include: investigating more 
principled probabilistic models; addressing im- 
mediate lower-level shortcomings in the current 
system as discussed in section above; adding 
mod (ification) GR annotations to the test cor- 
pus and extending the parser to also return 
these; and working on incorporating selectional 
preference information that we are acquiring in 
other, related work (McCarthy, 1997). 
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